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The main goal of this paper is to develop a theory of inference of player valuations from observed data in 
the generalized second price auction without relying on the Nash equilibrium assumption. Existing work in 
Economics on inferring agent values from data relies on the assumption that all participant strategies are 
best responses of the observed play of other players, i.e. they constitute a Nash equilibrium. In this paper, 
we show how to perform inference relying on a weaker assumption instead: assuming that players are using 
some form of no-regret learning. Learning outcomes emerged in recent years as an attractive alternative to 
Nash equilibrium in analyzing game outcomes, modeling players who haven’t reached a stable equilibrium, 
but rather use algorithmic learning, aiming to learn the best way to play from previous observations. In this 
paper we show how to infer values of players who use algorithmic learning strategies. Such inference is an 
important first step before we move to testing any learning theoretic behavioral model on auction data. We 
apply our techniques to a dataset from Microsoft’s sponsored search ad auction system. 
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1. INTRODUCTION 

The standard approach in the econometric analysis of strategic interactions, starts 
with assuming that the participating agents, conditional on their private parameters, 
such as their valuation in an auction, can fully optimize their utility given their oppo¬ 
nents actions and that the system has arrived at a stable state of mutual such best- 
responses, aka a Nash equilibrium. 

In recent years, learning outcomes have emerged as an important alternative to 
Nash equilibria. This solution concept is especially compelling in online environments, 
such as Internet auctions, as many such environments are best thought of as repeated 
strategic interactions in a dynamic and complex environment, where participants need 
to constantly update their strategies to learn how to play optimally. The strategies of 
agents in such environments evolve over time, as they learn to improve their strate¬ 
gies, and react to changes in the environment. Such learning behavior is reinforced 
even more by the increasing use of sophisticated algorithmic bidding tools by most 
high revenue/volume advertisers. With auctions having emerged as the main source 
of revenue on the Internet, there are multitudes of interesting data sets for strategic 
agent behavior in repeated auctions. 
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To be able to use the data on agent behavior to empirically test the prediction of 
the theory based on learning agents, one first needs to infer the agents’ types, or val¬ 
uations of the items, from the observed behavior without relying on the stable state 
best-response assumption. The idea behind inference based on the stable best response 
assumption is straightforward: the distribution of actions of players is observed in the 
data. If we assume that each player best responds to the distribution of opponents’ 
actions, this best response can be recovered from the data. The best response function 
effectively describes the preferences of the players, and can typically be inverted to re¬ 
cover each player’s private type. This is the idea used in all existing work in economics 
on this inference problem, including [Athey and Nekipelov 2010], [Bajari et al. 2013], 
and [Xin Jiang and Leyton-Brown 2007]. 

There are several caveats in this approach. First of all, the assumption that an equi¬ 
librium has been reached is unrealistic in many practical cases, either because equilib¬ 
rium best response functions are hard to compute or the amount of information needed 
to be exchanged between the players to reach an equilibrium outcome is unrealistically 
large. Second, the equilibrium is rarely unique especially in dynamic settings. In that 
case the simple “inversion” of the best responses to obtain the underlying valuations 
becomes a complicated computational problem because the set of equilibria frequently 
has a complicated topological structure. The typical approach to avoid this complica¬ 
tion is to assume some equilibrium refinement, e.g. the Markov Perfect Equilibrium in 
the case of dynamic games. 

In spite of a common understanding in the Economics community that many practi¬ 
cal environments cannot be modeled using the traditional equilibrium framework (and 
that the assumption of a particular equilibrium refinement holds), this point has been 
overshadowed by the simplicity of using the equilibrium based models. 

In this paper we consider a dynamic game where players learn how to play over 
time. We focus on a model of sponsored search auctions where bidders compete for ad¬ 
vertising spaces alongside the organic search results. This environment is inherently 
dynamic where the auction is run for every individual consumer search and thus each 
bidder with an active bid participates in a sequence of auctions. 

Learning models agents that are new to the game, or participants as they adjust 
to a constantly evolving environment, where they have to constantly learn what may 
be best play. In these cases, strict best response to the observed past may not be a 
good strategy, it is both computationally expensive, and can be far from optimal in a 
dynamically changing environment. Instead of strict best response, players may want 
to use a learning algorithm to choose their strategies. 

No-regret learning has emerged as an attractive alternative to Nash equilibrium. 
The notion of a no-regret learning outcome, generalizes Nash equilibrium by requiring 
the no-regret property of a Nash equilibrium, in an approximate sense, but more im¬ 
portantly without the assumption that player strategies are stable and independent. 
When considering a sequence of repeated plays, having no-regret means that the to¬ 
tal value for the player over a sequence of plays is not much worse than the value 
he/she would have obtained had he played the best single strategy throughout the 
time, where the best possible single strategy is determined with hindsight based on the 
environment and the actions of the other agents. There are many well-known, natural 
no-regret learning algorithms, such as the weighted majority algorithm [Arora et al. 
2012], [Littlestone and Warmuth 1994] (also known as Hedge [Freund and Schapire 
1999]) and regret matching [S. Hart and Mas-Colell 2000] just to name a few simple 
ones. We propose a theory of inference of agent valuations just based on the assump¬ 
tion that the agent’s learning strategies are smart enough that they have minimal 
regret, without making any assumptions on the particular no-regret algorithms they 
employ. 



There is a growing body of results in the algorithmic game theory literature char¬ 
acterizing properties of no-regret learning outcomes in games, such as approximate 
efficiency with respect to the welfare optimal outcome (see e.g. [Roughgarden 2012; 
Syrgkanis and Tardos 2013]). For instance, [Caragiannis et al. 2015] consider the gen¬ 
eralized second price auction in this framework, the auction also considered in our 
paper and showed that the average welfare of any no-regret learning outcome is al¬ 
ways at least 30% of the optimal welfare. To be able to apply such theoretical results 
on real data and to quantify the true inefficiency of GSP in the real world under learn¬ 
ing behavior, we first need a way to infer player valuations without relying on the 
equilibrium assumption. 

Our contribution. The main goal of this paper is to develop a theory of value infer¬ 
ence from observed data of repeated strategic interactions without relying on a Nash 
equilibrium assumption. Rather than relying on the stability of outcomes, we make the 
weaker assumption that players are using some form of no-regret learning. In a stable 
Nash equilibrium outcome, players choose strategies independently, and their choices 
are best response to the environment and the strategies of other players. Choosing a 
best response implies that the players have no regret for alternate strategy options. 
We make the analogous assumption, that in a sequence of play, players have small 
regret for any fixed strategy. 

Our results do not rely on the assumption that the participating players have correct 
beliefs regarding the future actions of their opponents or that they even can correctly 
compute expectations over the future values of state variables. Our only assumption is 
that the players understand the rules of the game. This is significantly different from 
most of the current results in Industrial Organization where estimation of dynamic 
games requires that the players have correct beliefs regarding the actions of their op¬ 
ponents. This is especially important to the analysis of bidder behavior in sponsored 
search auctions, which are the core application of this paper. The sponsored search 
marketplace is highly dynamic and volatile where the popularity of different search 
terms is changing, and the auction platform continuously runs experiments. In this 
environment advertisers continuously create new ads (corresponding to new bids), re¬ 
move under-performing ads, learning what is the best way to bid while participating in 
the game. In this setting the assumption of players who have correct beliefs regarding 
their opponents and whose bids constitute and equilibrium may not be realistic. 

When inferring player values from data, one needs to always accommodate small 
errors. In the context of players who employ learning strategies, a small error e > 0 
would mean that the player can have up to e regret, i.e., the utility of the player from 
the strategies used needs to be at most e worse than any fixed strategy with hindsight. 
Indeed, the guarantees provided by the standard learning algorithms is that the total 
utility for the player is not much worse than any single strategy with hindsight, with 
this error parameter decreasing over time, as the player spends more time learning. In 
aiming to infer the player’s value from the observed data, we define the rationalizable 
set Nr, consisting of the set of values and error parameters {v, e) such that with value 
V the sequence of bid by the player would have at most e regret. We show that NR is 
a closed convex set. Clearly, allowing for a larger error e would result in a larger set of 
possible values v. The most reasonable error parameter e for a player depends on the 
quality of his/her learning algorithm. We think of a rationalizable value v for a player 
as a value that is rationalizable with a small enough e. 

Our main result provides a characterization of the rationalizable set NR for the dy¬ 
namic sponsored search auction game. We also provide a simple approach to compute 
this set. We demonstrate that the evaluation of NR is equivalent to the evaluation 
of a one-dimensional function which, in turn, can be computed from the auction data 



directly by sampling. This result also allows us to show how much data is needed to 
correctly estimate the rationalizable set for the dynamic sponsored search auction. We 
show that when N auction samples are observed in the data, the Hausdoff distance 
between the estimated and the true sets AfR is log 7 V)t'/(2t'+i))^ where 7 is the 

sum of the number of derivatives of the allocation and pricing functions and the de¬ 
gree of Holder continuity of the oldest derivative. In particular, when the allocation and 
pricing functions are only Lipschitz-continuous, then the total number of derivatives is 
zero and the constant of Holder-continuity is 1 , leading to the rate 0{{N~^ log 

This favorably compares our result to the result in [Athey and Nekipelov 2010] 
where under the assumption of static Nash equilibrium in the sponsored search auc¬ 
tion, the valuations of bidders can be evaluated with error margin of order 
when the empirical pricing and allocation function are Lipschitz-continuous in the bid. 
This means that our approach, which does not rely on the equilibrium properties, pro¬ 
vides convergence rate which is only a (log factor away. 

In Section 6 we test our methods on a Microsoft Sponsored Search Auction data set. 
We show that our methods can be used to infer values on this real data, and study the 
empirical consequences of our value estimation. We find that typical advertisers bid a 
significantly shaded version of their value, shading it commonly by as much as 40%. 
We also observe that each advertiser’s account consists of a constant fraction of listings 
(e.g. bided keywords and ad pairs) that have tiny error and hence seem to satisfy the 
best-response assumption, whilst the remaining large fraction has an error which is 
spread out far from zero, thereby implying more that bidding on these listings is still 
in a learning transient phase. Further, we find that, among listings that appear to be 
in the learning phase, the relative error (the error in regret relative to the player’s 
value) is slightly positively correlated with the amount of shading. A higher error is 
suggestive of an exploration phase of learning, and is consistent with attempting larger 
shading of the bidder’s value, while advertisers with smaller relative regret appear to 
shade their value less. 

Further Related Work. There is a rich literature in Economics that is devoted to in¬ 
ference in auctions based on the equilibrium assumptions. [Guerre et al. 2000] studies 
the estimation of values in static first-price auctions, with the extension to the cases 
of risk-averse bidders in [Guerre et al. 2009] and [Campo et al. 2011]. The equilibrium 
settings also allow inference in the dynamic settings where the players participate in 
a sequence of auctions, such as in [Jofre-Bonet and Pesendorfer 2003]. A survey of ap¬ 
proaches to inference is surveyed in [Athey and Haile 2007]. These approaches have 
been applied to the GSP and, more generally, to the sponsored search auctions in the 
empirical settings in [Varian 2007] and [Athey and Nekipelov 2010]. 

The deviation from the “equilibrium best response” paradigm is much less common 
in any empirical studies. A notable exception is [Haile and Tamer 2003] where the 
authors use two assumptions to bound the distribution of values in the first-price auc¬ 
tion. The first assumption is that the bid should not exceed the value. That allows to 
bound the order statistics of the value distribution from below by the corresponding 
order statistics of the observed bid distribution. The second assumption is that there 
is a minimum bid increment and if a bidder was outbid by another bidder then her 
value is below the bid that made her drop out by at least the bid increment. That al¬ 
lows to provide the bound from above. The issue with these bounds is that in many 
practical instances they are very large and they do not directly translate to the dy¬ 
namic settings. The computation of the set of values compatible with the model may 
be compicated even in the equilibrium settings. For instance, in [Aradillas-Ldpez et al. 
2013] the authors consider estimation of the distribution of values in the ascending 
auction when the distribution of values may be correlated. It turns out that even if we 



assume that the bidders have perfect beliefs in the static model with correlated values, 
the inference problem becomes computationally challenging. 

2. NO-REGRET LEARNING AND SET INFERENCE 

Consider a game G with a set N of n players. Each player i has a strategy space Bi. 
The utility of a player depends on the strategy profile b e Bi x ... x i3„, on a set of 
parameters d e 0 that are observable in the data and on private parameters e Vi 
observable only by each player i. We denote with Ui{h; 6, vt) the utility of a player i. 

We consider a setting where game G is played repeatedly. At each iteration t, each 
player picks a strategy bj and nature picks a set of parameters 9*. The private parame¬ 
ter Vi of each player i remains fixed throughout the sequence. We will denote with {b}t 
the sequence of strategy profiles and with {9}t the sequence of nature’s parameters. 
We assume that the sequence of strategy profiles {b}* and the sequence of nature’s 
parameters {9}t are observable in the data. However, the private parameter Vi for 
each player i is not observable. The inference problem we consider in this paper is the 
problem of inferring these private values from the observable data. 

We will refer to the two observable sequences as the sequence of play. In order to be 
able to infer anything about agent’s private values, we need to make some rationality 
assumption about the way agents choose their strategies. Classical work of inference 
assumes that each agents best response to the statistical properties of the observable 
environment and the strategies of other players, in other words assumes that game 
play is at a stable Nash equilibrium. In this paper, we replace this equilibrium assump¬ 
tion with the weaker no-regret assumption, stating that the utility obtained through¬ 
out the sequence is at least as high as any single strategy bi would have yielded, if 
played at every time step. If the play is stable throughout the sequence, no-regret is 
exactly the property required for the play to be at Nash equilibrium. However, no¬ 
regret can be reached by many natural learning algorithms without prior information, 
which makes this a natural rationally assumption for agents who are learning how 
to best play in a game while participating. More formally, we make no assumption 
of what agents do to learn, but rather will assume that agents learn well enough to 
satisfy the following no-regret property with a small error. 

A sequence of play that we observe has Ci-regret for advertiser i if: 

T T 

t=i t=i 

This leads to the following definition of a rationalizable set under no-regret learning. 

Definition 2.1 {Rationalizable Set). A pair of a value Vi and error is a ra¬ 

tionalizable pair for player i if it satisfies Equation (1). We refer to the set of such pairs 
as the rationalizable set and denote it with NR. 

The rationality assumption of the inequality (1) models players who may be learning 
from the experience while participating in the game. We assume that the strategies 
6- and nature’s parameters 9* are picked simultaneously, so agent i cannot pick his 
strategy dependent on the state of nature 0* or the strategies of other agents b\_^. This 
makes the standard of a single best strategy b^ natural, as chosen strategies cannot de¬ 
pend on 0* or Beyond this, we do not make any assumption on what information 
is available for the agents, and how they choose their strategies. Some learning algo¬ 
rithms achieve this standard of learning with very minimal feedback (only the value 
experienced by the agent as he/she participates). If the agents know a distribution of 
possible nature parameters 9 or is able to observe the past values of the parameters 9* 
or the strategies (or both), and then they can use this observed past information 



to select their strategy at time step t. Such additional information is clearly useful in 
speeding up the learning process for agents. We will make no assumption on what in¬ 
formation is available for agents for learning, or what algorithms they may be using to 
update their strategies. We will simply assume that they use algorithms that achieve 
the no-regret (small regret) standard expressed in inequality ( 1 ). 

For general games and general private parameter spaces, the rationalizable set can 
be an arbitrary set with no good statistical or convexity properties. Our main result is 
to show that for the game of sponsored search auction we are studying in this paper, 
the set is convex and has good convergence properties in terms of estimation error. 


3. SPONSORED SEARCH AUCTIONS MODEL 

We consider data generated by advertisers repeatedly participating in sponsored 
search auction. The game G that is being repeated at each stage is an instance of a 
generalized second price auction triggered by a search query. 

The rules of each auction are as follows^: Each advertiser i is associated with a click 
probability 7 * and a scoring coefficient Si and is asked to submit a bid-per-click 6 *. Ad¬ 
vertisers are ranked by their rank-score qi = Si- bi and allocated positions in decreasing 
order of rank-score as long as they pass a rank-score reserve r. If advertisers also pass 
a higher mainline reserve m, then they may be allocated in the positions that appear 
in the mainline part of the page, but at most k advertisers are placed on the mainline. 

If advertiser i is allocated position j, then he is clicked with some probability pij, 
which we will assume to be separable into a part aj depending on the position and a 
part 7 i depending on the advertiser, and that the position related effect is the same in 
all the participating auctions: 


Pij = <^j ■ 7i (2) 

We denote with a = (ni,..., am) the vector of position coefficients. All the mentioned 
sets of parameters 9 = (s, b, 7 , r, m, k, a) are observable in the data. 

If advertiser i is allocated position j, then he pays only when he is clicked and his 
payment, i.e. his cost-per-click (CPC) is the minimal bid he had to place to keep his 
position, which is: 


Cij(h; 9) = max 


'^ 7 r(j + l) ' ^ 7 r(j-t-l) r in 


■ l{j e M} 


(3) 


where by 7 r(j) we denote the advertiser that is allocated position j and with M we 
denote the set of mainline positions. 

We also assume that each advertiser has a value-per-click (VPC) Vi, which is not 
observed in the data. If under a bid profile b, advertiser i is allocated slot CTi(b), his 
expected utility is: 

Ui{b;9,Vi) = • ji ■ (vi - Cj^ 7 b)(b; 6 »)) (4) 

We will denote with: 


Pi(b;6») = • 7i (5) 

the probability of a click as a function of the bid profile and with: 

Gi (b, 9) fb) * 7i ’ Gai (b) (h, 9) ( 6 ) 

the expected payment as a function of the bid profile. Then the utility of advertiser i 
at each auction is: 


Ui{h;9,Vi) = Vi ■ Pi{h) - Ci(b) 


( 7 ) 


^ignoring small details that we will ignore and are rarely in effect 



The latter fits in the general repeated game framework, where the strategy space 
Bi = IR+ of each player i is simply any non-negative real number. The private param¬ 
eter of a player is an advertiser’s value-per-click (VPC) and the set of parameters 
that affect a player’s utility at each auction and are observable in the data is 9. At 
each auction t in the sequence the observable parameters 9* can take arbitrary values 
that depend on the specific auction. However, we assume that the per-click value of the 
advertiser remains fixed throughout the sequence. 

Batch Auctions. Rather than viewing a single auction as a game that is being re¬ 
peated, we will view a batch of many auctions as the game that is repeated in each 
stage. This choice is reasonable, as it is impossible for advertisers to update their bid 
after each auction. Thus the utility of a single stage of the game is simply the aver¬ 
age of the utilities of all the auctions that the player participated in during this time 
period. Another way to view this setting is that the parameter 9 at each iteration t is 
not deterministic but rather is drawn from some distribution D* and a player’s utility 
at each iteration is the expected utility over D*. In effect, the distribution is the 
observable parameter, and utility depends on the distribution, and not only on a single 
draw from the distribution. With this in mind, the per-period probability of click is 

P/(W) =Ee..,3*[P,(b*;d)] (8) 

and the per-period cost is 

C‘(b‘) =Ee..D*[a(b‘;d)]. (9) 

We will further assume that the volume of traffic is the same in each batch, so the 
average utility of an agent over a sequence of stages is expressed by 

lf](^.-P/(b‘)-C'Kb*)). (10) 

i=l 

Due to the large volume of auctions that can take place in between these time- 
periods of bid changes, it is sometimes impossible to consider all the auctions and 
calculate the true empirical distribution D*. Instead it is more reasonable to approxi¬ 
mate D*, by taking a sub-sample of the auctions. This would lead only to statistical es¬ 
timates of both of these quantities. We will denote these estimates by P/ (b) and Cl (b) 
respectively. We will analyze the statistical properties of the estimated rationalizable 
set under such subsampling in Section 5. 


4. PROPERTIES OF RATIONALIZABLE SET FOR SPONSORED SEARCH AUCTIONS 

For the auction model that we are interested in. Equation (1) that determines whether 
a pair (e, v) is rationalizable boils down to: 


VP G K+ : ^ i ^ b*_,) - Pl{h*)) < i ^ {Cl{h', hC) - Cl{y)) + , 


( 11 ) 


If we denote with 

AP(P) = -i^‘(b‘)), (12) 

t=i 

the increase in the average probability of click from player i switching to a fixed alter¬ 
nate bid 5' and with 


AC'(P) 




(13) 



the increase in the average payment from player i switching to a fixed alternate bid b', 
then the condition simply states: 

yh' &R+:v AP(5') < AC(&') + e (14) 

Hence, the rationalizable set MR is an envelope of the family of half planes obtained 
by varying b e IR_|_ in Equation (14). 

We conclude this section by characterizing some basic properties of this set, which 
will be useful both in analyzing its statistical estimation properties and in computing 
the empirical estimate of MR from the data. 

Lemma 4.1. The set MR is a closed convex set. 


Proof. The Lemma follows by the fact that MR is defined by a set of linear con¬ 
straints. Any convex combination of two points in the set will also satisfy these con¬ 
straints, by taking the convex combination of the constraints that the two points have 
to satisfy. The closedness follows from the fact that points that satisfy the constraints 
with equality are included in the set. □ 


Lemma 4.2. For any error level e, the set of values that belong to the rationalizable 
set is characterized by the interval: 


V e 


AC(b') + € . AC{b') + e 

max — ^^—, min —;-^^— 

b':AP(b')<0 AP{b') b':AP{b')>0 AP{b') 


(15) 


In particular the data are not rationalizable for error level e if the latter interval is 
empty. 


Proof. Follows from Equation (14), by dividing with SP{b') and taking cases on 
whether AP{b') is positive or negative. □ 

To be able to estimate the rationalizable set MR we will need to make a few ad¬ 
ditional assumptions. All assumptions are natural, and likely satisfied by any data. 
Further, the assumptions are easy to test for in the data, and the parameters needed 
can be estimated. 

Since MR is a closed convex set, it is conveniently represented by the set of support 
hyperplanes defined by the functions P/(-, •) and Clf, •). Our first assumption is on the 
functions P*{-,-) and Cff,-). 


Assumption l. The support of bids is a compact set B = [0,5]. For each bidvec- 
tor bi; the functions Pl{-,hff) and are continous, monotone increasing and 

bounded on B. 


Remark. These assumptions are natural and are satisfied by our data. In most ap¬ 
plications, it is easy to have a simple upper bound M on the maximum conceivable 
value an advertiser can have for a click, and with this maximum, we can assume that 
the bids are also limited to this maximum, so can use B = M as a bound for the max¬ 
imum bid. The probability of getting a click as well as the cost per-click are clearly 
increasing function of the agent’s bid in the sponsored search auctions considered in 
this paper, and the continuity of these functions is a good model in large or uncertain 
environments. 

We note that properties in Assumption 1 implies that the same is also satisfied by 
the linear combinations of functions, as a linear combination of monotone functions 
is monotone, implying that the functions AP{b) and AC{b) are also monotone and 
continuous. 



The assumption further implies that for any value v, there exists at least one ele¬ 
ment of B that maximizes vAP{b) — AC{b), as vAP{b) — AC{b) is a continuous function 
on a compact set. 

4.1. Properties of the Boundary 

Now we can study the properties of the set AfR by characterizing its boundary, denoted 

dNR. 

Lemma 4.3. Under Assumption 1, dNR = {(r,e) : sup^ {vAP{b) — AC{b)) = e} 

Proof, (l) Suppose that {v, e) solves sup^ {vAP{b) — AC{b)) = e. Provided the con¬ 
tinuousness of functions AP{b) and AC{b) the supremum exists and it is attained at 
some b* in the support of bids. Take some 5 > 0. Taking point (v', e') such that v' = v 
and e' = e — S ensures that r'AP(5*) — AC{b*) > d and thus (v'd) ^ NR. Taking the 
point {v", e") such that v" = v and d' = e + 6 ensures that 

sup(v"AP(6)-AC'(6)) <e", 

b 

and thus for any bid b: v"AP{b) — AC{b) < d'. Therefore {v”, d') G NR. Provided that 5 
was arbitrary, this ensures that in any any neighborhood of {v, e) there are points that 
both belong to set NR and that do not belong to it. This means that this point belongs 
to dNR. 

(2) We prove the sufficiency part by contradiction. Suppose that {v,e) G dNR and 
supf, (vAP{b) — AC{b)) 7 ^ e. If supf, {vAP{b) ~ AC{b)) > e, provided that the objective 
function is bounded and has a compact support, there exists a point b* where the supre¬ 
mum is attained. For this point vAP{b*) — AC{b*) > e. This means that for this bid the 
imposed inequality constraint is not satisfied and this point does not belong to NR, 
which is the contradiction to our assumption. 

Suppose then that sup^ {vAP{b) — AC{b)) < e. Let Ae = e — supf, {vAP{b) — AC{b)). 
Take some S < Ae and take an arbitrary 61,62 be such that max||(ii|, sup^ AP(b) } < f- 
Let v' = v + 61 and d = e-\- 62 . Provided that 

sup(rAP(6) — AC(b)) > sup((t' -|- 6 i)AP{b) — AC{b)) — sup AP{b) 6 i, 

b b b 

for any t e (— 00 , Ae) we have 

sup(p'AP(6) — AC{b)) < e — T + 61 sup AP(5). 

b b 

Provided that |(5i supf, AP(&)| < 6/2 < Ae/2, for any |(i 2 | < 6/2, we can find t with 
—r + (5i supj AP{b) = 62 . Therefore 

sup{v'AP{b) - AC{b)) < d, 

b 

and thus for any b: v'AP{b) — AC{b) < d. Therefore {v', d) G NR. This means that for 
(v, e) we constructed a neighborhood of size 6 such that each element of that neighbor¬ 
hood is an element of NR. Thus {v, e) cannot belong to dNR. □ 

The next step will be to establish the properties of the boundary by characteriz¬ 
ing the solutions of the optimization problem of selecting the optimal bid single 
b with for a given value v and the sequence of bids by other agents, defined by 

supt, {vAP{b) - AC{b)). 

Lemma 4.4. Let b*{v) = argsnpi, {vAP{b) — AC(b)). Then b*{-) is upper- 

hemicontinuous and monotone. 



Proof. By Assumption (1) the function vAP{b) — AC{b) is continuous in v and b, 
then the upper hemicontinuity of 6 * (•) follows directly from the Berge’s maximum the¬ 
orem. 

To show that b*{-) is monotone consider the function q{b; v) = vAP{b) — AC{b). We’ll 
show that this function is supermodular in (5; v), that is, for &' > 6 and v' > v we have 

q{b'-, v') + q{b] v) > q{b'] v) + q{b; v'). 

To see this observe that if we take P > v, then 

q{b; v) - q{b; v) = (v' - v)AP(b), 

which is non-decreasing in b due to the monotonicity of AP(-), implying that q is su¬ 
permodular. Now we can apply the Topkis’ ([Topkis 1998], [Vives 2001]) monotonicity 
theorem from which we immediately conclude that &*(•) is non-decreasing. □ 

Lemma 4.4 provides us a powerful result of the monotonicity of the optimal response 
function b* (v) which only relies on the boundedness, compact support and monotonicity 
of functions AP(-) and AC(-) without relying on their differentiability. The downside 
of this is that 6 *( ) can be set valued or discontinuously change in the value. To avoid 
this we impose the following additional assumption. 


Assumption 2. For each bi and 62 > 0 the incremental cost per click function 


ICCib^M) 


AC{b2) - AC(&i) 
AP{b2) - AP(&i) 


is continuous in bi for each &2 7 ^ and it is continuous in 62 for each 61 7 ^ 62 . Moreover 
for any 64 > 63 > 62 > 61 on B: ICC{bi,b^) > ICC{b 2 ,bi). 


Remark. Assumption 2 requires that there are no discounts on buying clicks “in 
bulk”: for any average position of the bidder in the sponsored search auction, an incre¬ 
mental increase in the click yield requires the corresponding increase in the payment. 
In other words, “there is no free clicks”. In [Athey and Nekipelov 2010] it was shown 
that in sponsored auctions with uncertainty and a reserve price, the condition in As¬ 
sumption 2 is satisfied with the lower bound on the incremental cost per click being 
the maximum of the bid and the reserve price (conditional on the bidder’s participation 
in the auction). 


Lemma 4.5. Under Assumptions 1 and 2 b*{v) = argsupf, (rAP(6) — AC'(6)) is a 
continuous monotone function. 

Proof. Under Assumption 1 in Lemma 4.4 we established the monotonicity and 
upper-hemicontinuity of the mapping 6 *( ). We now strengthen this result to mono¬ 
tonicity and continuity. 

Suppose that for value v, function vAP{b) — AC{b) has an interior maximum and let 
B* be its set of maximizers. First, we show that under Assumption 2 B* is singleton. 
In fact, suppose that 6 ], 63 € B* and wlog 5* > 62 - Suppose that b G ( 63 , bj). First, note 
that b cannot belong to B*. If it does, then 

vAP{bl) - AC{bl) = vAP{b) - AC{b) = vAP{b*) - ACfbl), 

and thus ICC( 6 ,5*) = ICC (&25 b) for bl < b < b^ which violates Assumption 2 . Second if 
b ^ B*, then 

vAP{b\) - AC{b\) > vAP{b) - AC{b), 

and thus v < ICC(&, 63 ). At the same time, 

vAP{b*) - AC(&*) > vAP{b) - AC{b), 



and thus v > ICC( 52 , 6 ). This is impossible under Assumption 2 since it requires that 
100 ( 62 , > 100(6, bl). Therefore, this means that B* is singleton. 

Now consider v and v' = v + 6 for a sufficiently small 6 > 0 and v and v' leading 
to the interior maximum of the of the objective function. By the result of Lemma 4.4, 
6 *(w') > b*{v) and by Assumption 2 the inequality is strict. Next note that for any 
6 > b*{v) 100(6,6*(r)) > v and for any 6 < b*{v) 100(6,6*(r)) < v. By continuity, 
b*{v) solves 100(6,6*(r)) = v. Now let 6' solve lCC{b',b*{v)) = v'. By Assumption 2, 
6 ' > b*{v') since b*{v') solves 100(6,6*(v')) = v' and b*{v') > b*{v). Then by Assumption 
2, 100(6', 6*(t))) — V > V — 100(0,6*(r)). This means that 6' — b*{v) < {v' — v)/{v — 
IOO(0,6*(v))).Thismeansthat|6*(r')-6*(t>)| < |6'-6*(v)| < S/{v-lCC{0,b*{v))).This 
means that b*{v) is continuous in □ 

Lemma 4.5, the interior solutions maximizing vAP{b) — AC{b) are continuous in v. 
We proved that each boundary point of MR corresponds to the maximant of this func¬ 
tion. As a result, whenever the maximum is interior, then the boundary shares a point 
with the support hyperplane vAP{b*{v)) — AC{b*{v)) = e. Therefore, the corresponding 
normal vector at that boundary point is {P{b*{v)), —1). 

4.2. Support Function Representation 

Our next step would be to use the derived properties of the boundary of the set AfR 
to compute it. The basic idea will be based on varying v and computing the bid 6 * (v) 
that maximizes regret. The corresponding maximum value will deliver the value of 
e corresponding to the boundary point. Provided that we are eventually interested 
in the properties of the set AfR (and the quality of its approximation by the data), 
the characterization via the support hyperplanes will not be convenient because this 
characterization requires to solve the computational problem of finding an envelope 
function for the set of support hyperplanes. Since closed convex bounded sets are fully 
characterized by their boundaries, we can use the notion of the support function to 
represent the boundary of the set NR. 

Definition 4.6. The support function of a closed convex set X is defined as: 

h{X,u) = swp{x,u), 
xex 

where in our case X = NR is a subset of or value and error pairs {v, e), and then u 
is also an element of 

An important property of the support function is the way it characterizes closed 
convex bounded sets. Recall that the Hausdorf norm for subsets A and B of the metric 
space E with metric p(-, •) is defined as 

dniAjB) = maxjsup inf p{a,b), sup inf p{a,b)}. 

It turns out that dniA, B) = sup„ \h{A, u) — h{B, m)|. Therefore, if we find h{NR, u), this 
will be equivalent to characterizing NR itself 

We note that the set NR is not bounded: the larger is the error e that the player 
can make, the wider is the range of values that will rationalize the observed bids. We 
consider the restriction of the set NR to the set where we bound the error for the 
players. Moreover, we explicitly assume that the values per click of bidders have to be 
non-negative. 

Assumption 3. 

(i) The rationalizable values per click are non-negative: v >0. 

(ii) There exists a global upper bound e for the error of all players. 



Remark. While non-negativity of the value may be straightforward in many auction 
context, the assumption of an upper bound for the error may be less straightforward. 
One way to get such an error bound is if we assume that the values come from a 
bounded range [0, M] (see Remark after Assumption (1)), and then an error e > M 
would correspond to negative utility by the player, and the players may choose to exit 
from the market if their advertising results in negative value. 

Theorem 4.7. Under Assumption 1, 2 the support function of NR is 

j + 00 , ifu 2 > 0, or ^ [inf6 AP{b), supf, AP(6)], 

h(NR,u) - I (i^)) , ifu 2 < Oand € [inff, AP(&), sup^ AP(6)]. 

PrOOE. Provided that the support function is positive homogenenous, without loss 
of generality we can set u = (ui, U 2 ) with ||u|| = 1. To find the support function, we take 
ui to be dual to Vi and U 2 to be dual to e*. We re-write the inequality of the half-plane 
as: Vi ■ AP{bi) — < AC{bi). We need to evaluate the inner-product 

uiVi + U2ei 

from above. We note that to evaluate the support function for U 2 > 0, the corresponding 
inequality for the half-plane needs to “flip” to > —AC{bi). This means that for U 2 > 0, 
h{NR, u) = + 00 . Next, for any M 2 < 0 we note that the inequality for the half-plane can 
be re-written as: Vt\u 2 \AP{bi) + U 2 ei < \u 2 \AC{bi). 

As a result if there exists bi such that for a given m: = AP{bi), then |M 2 |AC'( 6 i) 

corresponds to the support function for this bi. 

Now suppose that supf,AP(5) > 0 and mi > |m 2 | supf, AP(&). In this case we can 
evaluate 


UiVi + U2€i = (ui — |m2| sup AP{b))vi + \U2\ SUp AP( 6 )vi + U2€i. 

b b 

Function |m 2 | supf, AP(&)Mi + U2ei is bounded by |u2| supf, AC'(&) for each {vi,ei) € NR. 
At the same time, function (mi — |m 2| sup^ AP{b))vi is strictly increasing in Vi on NR. 
As a result, the support function evaluated at any vector u with mi/|m 2| > sup^ AP{b) 
is h{NR, u) = -too. □ 

This behavior of the support function can be explained intuitively. The set NR is 
a convex set in e > 0 half-space. The unit-length u corresponds to the normal vector 
to the boundary of NR and —h{NR,u) is the point of intersection of the e axis by 
the tangent line. Asymptotically, the boundaries of the set will approach to the lines 
Vi supf, AP{b) — e < sup^ AC{b) and Vi inff, AP{b) — e< inff, AC{b). First of all, this means 
that the projection of U 2 coordinate of of the normal vector of the line that can be 
tangent to AfR has to be negative. If that projection is positive, that line can only 
intersect the set NR. Second, the maximum slope of the normal vector to the tangent 
line is sup^ AP{b) and the minimum is inff, AP{b). Any line with a steeper slope will 
intersect the set NR. 

Now we consider the restriction of the set NR via Assumption 3. The additional 
restriction on what is rationalizable does not change its convexity, but it makes the 
resulting set bounded. Denote this bounded subset of NR by NRb- The following the¬ 
orem characterizes the structure of the support function of this set. 

Theorem 4.8. Let v correspond to the furthest point of the set NRb from the e axis. 
Suppose that b is such that vAP{b) — e = AC{b). Also suppose that b is the point of inter¬ 
section of the boundary of the set NR with the vertical axis v = 0 . Under Assumptions 



1, 2, and 3 the support function of MRb is 


h{AfRB,u) 


h{AfR, u), 
uiv + U2e, 
U2C, 

uiv + U2e, 
U2C, 


ifu2 < 0, AP{b) < ui/\u2\ < AP{b), 
ifu2 < 0, ^ 1 / 1 ^ 2 ! > AP{b), 
ifu2 < 0, AP{b) > ui/\u2\, 
ifui,U2 > 0 , 
ifui < 0, U2 > 0 , 


Proof. We begin with U 2 < 0. In this case when AP{b) < ui/\u 2 \ < AP{b), then 
the corresponding normal vector u is on the part of the boundary of the set MRb 
that coincides with the boundary of AfR. And thus for this set of normal vectors 

h{AfRB,u) = h{AfR,u). 

Suppose that U 2 < 0 while ui/\u 2 \ > AP{b). In this case the support hyperplane is 
centered at point {v, e) for the entire range of angles of the normal vectors. Provided 
that the equation for each such a support hyperplane corresponds to — e = c and 
each needs pass through (v,e), then the support function is h{AfRB,u) = uiv + U 2 e- 

Suppose that M 2 < 0 and mi/|m 2 | < AP{b). Then the support hyperplanes will 
be centered at point (0,m). The corresponding support function can be expressed as 
h{AfRB, u) = U2C- 

Now suppose that M 2 > 0 and ui > 0. Then the support hyperplanes are centered at 
(v, e) and again the support function will be h{AfRB, u) = uiv + U 2 e. 

Finally, suppose that U 2 > 0 and mi < 0. Then the support hyperplanes are centered 
at (0, e). The corresponding support function is h{NRB,u) = U 2 e. □ 


5. INFERENCE FOR RATIONALIZABLE SET 

Note that to construct the support function (and thus fully characterize the set NRb) 
we only need to evaluate the function AC (AP“^ (•)). It is one-dimensional function 
and can be estimated from the data via direct simulation. We have previously noted 
that our goal is to characterize the distance between the true set NRb and the set NR 
that is obtained from subsampling the data. Denote 

f{-) = AC (AP-i(.)) 

and let /((be its estimate from the data. The set NRb is characterized by its support 
function h{NRB,u) which is determined by the exogenous upper bound on the error 
e, the intersection point of the boundary of NR with the vertical axis (0,e), and the 
highest rationalizable value v. 

We note that the set NR lies inside the shifted cone defined by half-spaces 
V sup^ AP{b) — e < supj AC{b) and v inff, AP{b) — e< inf^ AC{h). Thus the value v can be 
upper bounded by the intersection of the line v sup^ AP{b) — e = sup^ AC{h) with e = e. 
The support function corresponding to this point can therefore be upper bounded by 
|M2|supf, AC{b). 

Then we notice that 


dH{NRB,NRB)= sup \h{NRB,u) — h{NRB,u)\. 

\\u\\ = l 


For the evaluation of the sup-norm of the difference between the support function, we 
split the circle ||m|| = 1 to the areas where the support function is linear and non-linear 
determined by the function / (•). For the non-linear segment the sup-norm can be upper 



bounded by the global sup-norm 


sup \h{URB,u)-h{URB,u)\< sup \u2\f (- \u2\f 

U2<0,AP{b)<ui/\u2\<AP(b) ||«|| = 1 VI'W2|/ 

<sup f{z)-f{z) . 

Z 

For the linear part, provided that the value e is fixed, we can evaluate the norm from 
above by 

sup lui sup AC'(6) — miAC(&)| < I sup AC(&) — AC'(6)| < sup f (z) — f (z) . 

||u|| = l b b z 

Thus we can evaluate 

dH(J^R b, M'Rb) < sup f{z)-f{z) . (16) 

Z 

Thus, the bounds that can be established for estimation of function / directly imply 
the bounds for estimation of the Hausdorff distance between the estimated and the 
true sets NRb- We assume that a sample of size A = n x T is available (where n is 
the number of auctions sampled per period and T is the number of periods). We now 
establish the general rate result for the estimation of the set NRb- 

Theorem 5.1. Suppose that function f has derivative up to order k > Q and for 
some L > 0 

i/W(^i)-/W(^2)i<hki-^2r. 

Under Assumptions 1, 2, and 3 we have 

dni^B, hfRB) < 0((A-i log A)t'/(27+i))^ ^ = k + a. 

Remark. The theorem makes a further assumption the function / is k times differen¬ 
tiable, and satisfies a Lipschitz style bound with parameter L > 0 and exponent a. We 
note that this theorem if we take he special case of fc = 0, the theorem does not require 
differentiability of functions AP() and AC'(). If these functions are Lipschitz, the con¬ 
dition of the theorem is satisfied with k = 0 and a = 1, and the theorem provides a 
0{{N~^ log Ny/^) convergence rate for the estimated set ffR. 

PrOOE. By (16) the error in the estimation of the set MRb is fully characterized 
by the uniform error in the estimation of the single-dimensional function /(•). In part 
(i) of the Theorem our assumption is that we estimate function /(•) from the class 
of convex functions. In part (ii) of the Theorem our assumption is that we estimate 
the function from the class of k times differentiable single-dimensional functions. We 
now use the results for optimal global convergence rates for estimation of functions 
from these respective classes. We note that these rates do not depend on the particular 
chosen estimation method and thus provide a global bound on the convergence of the 
estimated set NRb to the true set. By [Stone 1982], the global uniform convergence 
rate for estimation of the unknown function with k derivatives where /c-th derivative is 
Holder-continuous with constant a is log with 7 = k+a. That delivers 

our statement. □ 

We note that this theorem does not require differentiability of functions AP( ) 
and AC'( ). For instance, if these functions are Lipschitz, the theorem provides a 
log Ny/y convergence rate for the estimated set MR. 






Fig. 1. Mr set for two listings of a high-frequency bid changing advertiser. Values are normalized to 1. The 
tangent line selects our point prediction. 


6. DATA ANALYSIS 

We apply our approach to infer the valuations and regret constants of a set of advertis¬ 
ers from the search Ads system of Bing. Our focus is on advertisers who change their 
bids frequently (up to multiple bid changes per hour) and thus are more prone to use 
algorithms for their bid adjustment instead of changing those bids manually. Each ad¬ 
vertiser corresponds to an “account”. Accounts have multiple listings corresponding to 
individual ads. The advertisers can set the bids for each listing within the account sep¬ 
arately. We examine nine high frequency accounts from the same industry sector for a 
period of a week and apply our techniques to all the listings within each account. The 
considered market is highly dynamic where new advertising campaigns are launched 
on the daily basis (while there is also a substantial amount of experimentation on the 
auction platform side, that has a smaller contribution to the uncertainty regarding the 
market over the uncertainty of competitors’ actions.) We focus on analyzing the bid dy¬ 
namics across the listings within the same account as they are most probably instances 
of the same learning algorithm. Hence the only thing that should be varying for these 
listings is the bidding environment and the valuations. Therefore, statistics across 
such listings, capture in some sense the statistical behavior of the learning algorithm 
when the environment and the valuation per bid is drawn from some distribution. 

Computation of the Empirical Rationalizable Set. We first start by briefly describ¬ 
ing the procedure that we constructed to compute the set AfR for a single listing. We 
assumed that bids and values have a hard absolute upper bound and since they are 
constrained to be multiples of pennies, the strategy space of each advertiser is now a 
finite set, rather than the set M+. Thus for each possible deviating bid b' in this finite 
setwe compute the AP(6') and AC{b') for each listing. We then discretize the space of 
additive errors. For each additive error e in this discrete space we use the characteriza¬ 
tion in Lemma 4.2 to compute the upper and lower bound on the value for this additive 
error. This involves taking a maximum and a minimum over a finite set of alternative 
bids b'. We then look at the smallest epsilon eo, where the lower bound is smaller than 
the upper bound and this corresponds to the smallest rationalizable epsilon. For every 
epsilon larger than eo we plot the upper bound function and the lower bound function. 

An example of the resulting set AfR for a high frequency listing of one of the adver¬ 
tisers we analyzed is depicted in Figure 1. From the figure, we observe that the right 
listing has a higher regret than the left one. Specifically, the smallest rationalizable 
additive error is further from zero. Upon the examination of the bid change pattern, 
the right listing in the Figure was in a more transient period where the bid of the ad¬ 
vertiser was increasing, hence this could have been a period were the advertiser was 
experimenting. The bid pattern of the first listing was more stable. 





Point Prediction: Smallest Multiplicative Error. Since the two dimensional rational- 
izable set MR is hard to depict as summary statistic from a large number of listings, 
we instead opt for a point prediction and specifically we compute the point of the MR 
set that has the smallest regret error. 

Since the smallest possible additive error is hard to compare across listings, we in¬ 
stead pick the smallest multiplicative error that is rationalizable, i.e. a pair {S, v) such 
that: 

Vb' e S, : i ELi (b‘; 0*, Vi) > (1 - ELi {b', b*_,; 0*, v?) (17) 

and such that 6 is the smallest possible value that admits such a value per click v. De¬ 
note i Etli the observed average probability of click of the advertiser and 

with (7° = ^ E^i C'|(b*) the observed average cost, then by simple re-arrangements 
the latter constraint is equivalent to: 

V6' G B, : t>AP(5) < AC'(&) + (vP° - ((7°) (18) 

By comparing this result to Equation (14), a multiplicative error of 5 corresponds to an 
additive error of e = (rP° — C^). 

Hence, one way to compute the feasible region of values for a multiplicative error 6 
from the MR set that we estimated is to draw a line of e = {yPf — C^). The two 
points where this line intersects the MR set correspond to the upper and lower bound 
on the valuation. Then the smallest multiplicative error 5, corresponds to the line that 
simply touches the MR set, i.e. the smallest 5 for which the intersection of the line 
e = {vP^ — C^) is non-empty. This line is depicted in orange in Figure 1 and the 
point appears with a black mark. 

Computationally, we estimate this point by binary search on the values of <5 G [0,1], 
until the upper and lower point of <5 in the binary search is smaller than some pre¬ 
defined precision. Specifically, for each given 6, the upper and lower bound of the values 
implied by the constraints in Equation 18 is: 

(1 - 5)AC{b') - SC° ^ ^ . (1 - S)AC(b') - (5C° 

6':(l-<5)Z:^f^)-5PO>0 (1 — S)AP(b') — jpo — ^ — ;,':(l-5)A™(P)-<5PO>0 (1 — S)AP(b') — 6P^ 

If these inequalities have a non-empty intersection for some value of 5, then they have 
a non-empty intersection for any larger value of 5 (as is implied by our graphical inter¬ 
pretation in the previous paragraph). 

Thus we can do a binary search over the values of (5 G (0,1). At each iteration we 
have an upper bound H and lower bound L on 5. We test whether 6 = {H + L)[2 gives a 
non-empty intersection in the above inequalities. If it does, then we decrease the upper 
bound i7 to (i7 + L)/2, if it doesn’t then we increase the lower bound L to (i7 + h)/2. We 
stop whenever H — L is, smaller than some precision, or when the implied upper and 
lower bounds on v from the feasible region for S = H, are smaller than some precision. 

The value that corresponds to the smallest rationalizable multiplicative error 6 can 
be viewed as a point prediction for the value of the player. It is exactly the value that 
corresponds to the black dot in Figure 1. Since this is a point of the MR set the estima¬ 
tion error of this point from data has at least the same estimation error convergence 
properties as the whole MR set that we derived in Section 5. 

Experimental Eindings. We compute the pair of the smallest rationalizable multi¬ 
plicative error <5* and the corresponding predicted value v* for every listing of each 
of the nine accounts that we analyzed. In Figure 2, on the right we plot the distri¬ 
bution of the smallest non-negative rationalizable error over the listings of a single 
account. Different listings of a single account are driven by the same algorithm, hence 




Fig. 2. Histogram of the ratio of predicted value over the average bid in the sequence and the histogram 
of the smallest non-negative rationalizable multiplicative error 5 (the smallest bucket contains all listings 
with a non-positive smallest rationalizable error). 


we view this plot as the “statistical footprint” of this algorithm. We observe that all 
accounts have a similar pattern in terms of the smallest rationalizable error: almost 
30% of the listings within the account can be rationalized with an almost zero error, 
i.e. S* < 0.0001. We note that regret can also be negative, and in the figure we group 
together all listings with a negative smallest possible error. This contains 30% of the 
listings. The empirical distribution of the regret constant <5* for the remaining 70% of 
the listings is close to the uniform distribution on [0, .4], Such a pattern suggests that 
half of the listings of an advertiser have reached a stable state, while a large fraction 
of them are still in a learning state. 

We also analyze the amount of relative bid shading. It has been previously observed 
that bidding in the GSP auctions is not truthful. Thus we can empirically measure 
the difference between the bids and estimated values of advertisers associated with 
different listings. Since the bid of a listing is not constant in the period of the week 
that we analyzed, we use the average bid as proxy for the bid of the advertiser. Then 
we compute the ratio of the average bid over the predicted value for each listing. We 
plot the distribution of this ratio over the listings of a typical account in the left plot 
of Figure 2. We observe that based on our prediction, advertisers consistently bid on 
average around 60% of their value. Though the amount of bid shading does have some 
variance, the empirical distribution of the ratio of average bid and the estimated value 
is close to normal distribution with mean around 60%. 

Interestingly, we observe that based on our predic¬ 
tion, advertisers consistently bid on average around 
60% of their value. Though the amount of bid shading 
does have some variance, the empirical distribution 
of the ratio of average bid and the estimated value is 
close to normal distribution with mean around 60%.^ 
Figure 2 depicts the ratio distribution for one ac¬ 
count. We give similar plots for all other accounts that 
we analyzed in the full version of the paper. 

Last we also analyze whether there is any corre¬ 
lation between the smallest rationalizable error and 
the amount of underbidding for listings that seem to 
be in a learning phase (i.e. have S* greater than some 
tiny amount). We present a scatter plot of the pairs of {S*,v*) in Figure 3 for a single 


Fig. 3. Scatter plot of pairs {v*,S*) 
for all listings of a single advertiser. 


^We do observe that for a very small number of listings within each account our prediction implies a small 
amount of overbidding on average. These outliers could potentially be because our main assumption that 
the value remains fixed throughout the period of the week doesn’t hold for these listings due to some market 
shock. 





















account and for listings that have S* > 0.0001. Though there does not seem to be a 
significant correlation we consistently observe a small correlation: listings with higher 
error shade their bid more. 
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